AITopics | no-regret learning dynamic

Near-OptimalNo-RegretLearningDynamicsfor GeneralConvexGames

Neural Information Processing SystemsFeb-13-2026, 05:02:42 GMT

A recent line of work has established uncoupled learning dynamics such that, when employed by all players in a game, each player's regret after T repetitions grows polylogarithmically in T, an exponential improvement over the traditional guarantees within the no-regret framework. However, so far these results have only been limited to certain classes of games with structured strategy spaces--such as normal-form and extensive-form games. The question as to whether O(polylogT) regret bounds can be obtained for general convex and compact strategy sets--which occur in many fundamental models in economics and multiagent systems--while retaining efficient strategy updates is an importantquestion.

artificial intelligence, machine learning, normal-form game, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.55)

Add feedback

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Neural Information Processing SystemsDec-24-2025, 13:47:25 GMT

Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games.

convergence, name change, no-regret learning dynamic, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Neural Information Processing SystemsDec-24-2025, 01:47:07 GMT

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium.

extensive-form correlated equilibrium, no-regret learning dynamic, normal-form game, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Review for NeurIPS paper: No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Neural Information Processing SystemsJan-24-2025, 14:14:39 GMT

Summary and Contributions: The authors provide a regret-minimisation approach to computing an analogue to correlated equilibria in extensive form games called extensive-form correlated equilibria (EFCE). It was previously unknown whether EFCE can be achieved via uncoupled no-regret dynamics as in typical correlated equilibria in simultaneous games, and the authors provide a way of doing so by introducing an appropriate notion of regret in the extensive form setting (that lines up with the notion of approximation in approximate EFCE), and demonstrating how achieving low regret in this setting suffices to have an approximate EFCE for joint strategy profiles that arise from empirical frequencies of play. As mentioned before, the relevant notion of equilibrium in this setting are extensive-form correlated equilibria (EFCE). Such an equilibrium is a joint distribution over the space of all possible plans of all agents. As is typical in an extensive-form setting, a plan is simply a mapping from information sets to their relevant action profiles that dictates what an agent does at any given situation of play. In the simultaneous setting, a mixed strategy profile is a correlated equilibrium when no agent wishes to deviate from the joint strategy profile, conditional on their realised strategy profile and prior knowledge of the joint distribution of play.

efce, information, strategy profile, (12 more...)

Neural Information Processing Systems

Technology:

Information Technology > Game Theory (0.73)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.39)

Add feedback

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Neural Information Processing SystemsOct-11-2024, 03:21:14 GMT

Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times.

convergence, no-regret learning dynamic, time-varying game, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium

Neural Information Processing SystemsOct-10-2024, 06:22:33 GMT

The existence of simple, uncoupled no-regret dynamics that converge to correlated equilibria in normal-form games is a celebrated result in the theory of multi-agent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normal-form game, the empirical frequency of play converges to a normal-form correlated equilibrium. Extensive-form (that is, tree-form) games generalize normal-form games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensive-form correlation has significantly different properties than the normal-form counterpart, many of which are still open research directions. Extensive-form correlated equilibrium (EFCE) has been proposed as the natural extensive-form counterpart to normal-form correlated equilibrium.

extensive-form correlated equilibrium, no-regret learning dynamic, normal-form game, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Anagnostides, Ioannis, Panageas, Ioannis, Farina, Gabriele, Sandholm, Tuomas

arXiv.org Artificial IntelligenceOct-18-2023

Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also apply to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games.

convergence, no-regret learning dynamic, time-varying game

arXiv.org Artificial Intelligence

2301.11241

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Interact With Learning Agents

Singla, Adish (MPI-SWS) | Hassani, Hamed ( University of Pennsylvania ) | Krause, Andreas ( ETH Zurich )

AAAI ConferencesFeb-8-2018

AI and machine learning methods are increasingly interacting with and seeking information from people, robots, and other learning agents. Consequently, the learning dynamics of these agents creates fundamentally new challenges for existing methods. Motivated by the application of learning to offer personalized deals to users, we highlight these challenges by studying a variant of the framework of "online learning using expert advice with bandit feedback." In our setting, we consider each expert as a learning agent, seeking to more accurately reflect real-world applications. The bandit feedback leads to additional challenges in this setting: at time t, only the expert i t that has been selected by the central algorithm (forecaster) receives feedback from the environment and gets to learn at this time. A natural question to ask is whether it is possible to be competitive with the best expert j* had it seen all the feedback, i.e., competitive with the policy of always selecting expert j* . We prove the following hardness result — without any coordination between the forecaster and the experts, it is impossible to design a forecaster achieving no-regret guarantees. We then consider a practical assumption allowing the forecaster to guide the learning process of the experts by blocking some of the feedback observed by them from the environment, i.e., restricting the selected expert i t to learn at time t for some time steps. With this additional coordination power, we design our forecaster LIL that achieves no-regret guarantees, and we provide regret bounds dependent on the learning dynamics of the best expert j* .

data mining, forecaster, machine learning, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: